Skip to content

Comments

Add submission api for web#334

Merged
yangw-dev merged 29 commits intomainfrom
addapiweb
Aug 28, 2025
Merged

Add submission api for web#334
yangw-dev merged 29 commits intomainfrom
addapiweb

Conversation

@yangw-dev
Copy link
Collaborator

@yangw-dev yangw-dev commented Aug 22, 2025

Description

Add an API serve for website gpumode, the api will return sub_id and sub_job_status_id immediately after the job is enqueue with background manager, then pull the data periodically for the update.

Adding queue mechnism to accept run jobs from api request /submission/ and runs the backend job.
the gpumode can then pull the db table to consistently check the status of the submission run.

Why not fastapi background task?

https://fastapi.tiangolo.com/tutorial/background-tasks/

The fastapi background task is ok to use when the background job is <=30 secs such as sending emails, notificaiton. For us, it's not ideal since it uses same event loop.

Long term

Ideally we should handle submission jobs as background jobs using RQ or celery, but due to the time limit, this is a compromise solution. Some improvement can be done in next prs:

  1. allow queue recoveries from the db data
  2. handle shut down more gracefully, (for instance let running job finish, when heroku restarted)

Or we should change the way we interact with the github runner, instead of keeping listening, we should provide a hook to let the workflow trigger the important update includes upload the result

Checklist

Before submitting this PR, ensure the following steps have been completed:

  • Run the slash command /verifyruns on your own server.
    • Run the cluster bot on your server:
      python discord-bot.py
    • Start training runs with the slash command /verifyruns.
    • Verify that the bot eventually responds with:
      ✅ All runs completed successfully!
      
      (It may take a few minutes for all runs to finish. In particular, the GitHub
      runs may take a little longer. The Modal run is typically quick.)
      For more information on running a cluster bot on your own server, see
      README.md.

@yangw-dev yangw-dev requested a review from S1ro1 August 23, 2025 22:26
@yangw-dev
Copy link
Collaborator Author

@S1ro1
hMMM so i followed the guidane to setup my local env, but the /verilyruns still fail, not really sure how to verify that.
need some guidance.

but feel free to start the server and run this with the database:

INSERT INTO leaderboard.user_info (id,user_name, web_auth_id) VALUES ('123','alice', '234');
INSERT INTO leaderboard.gpu_type (leaderboard_id, gpu_type) VALUES (340, 'L4');
UPDATE leaderboard.leaderboard SET deadline = TIMESTAMPTZ '2026-09-01 23:59:59+00' WHERE id = 340;

the trigger the api:

 curl -N  -X POST "http://127.0.0.1:8000/submission/grayscale/L4/test" \
  -H "X-Web-Auth-Id: 234"

@yangw-dev
Copy link
Collaborator Author

i setup this queue process due to the time limit, it's not perfect, but due to the time limit, i put it here:
in long term

In long term we should consider options:

  1. passively let github call an api we provide for running update instead of keep listening
    or
  2. introducte redis queue to handle the queue and background job properly

@yangw-dev yangw-dev marked this pull request as ready for review August 24, 2025 00:11
@yangw-dev yangw-dev requested review from S1ro1 and ngc92 August 26, 2025 04:34
@yangw-dev yangw-dev changed the title echo variables Add submission api for web Aug 26, 2025
@github-actions
Copy link

github-actions bot commented Aug 27, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/libkernelbot
  backend.py 59
  background_submission_manager.py 157-158, 184-187, 205-210, 227-252
  db_types.py 7
  leaderboard_db.py 377-388, 402-420
  utils.py
Project Total  

This report was generated by python-coverage-comment-action

Copy link
Member

@S1ro1 S1ro1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% happy with the complexity it adds, but if it works we can always fix it later (we never do). Tested locally with CLI, works just fine

@yangw-dev yangw-dev merged commit 9926095 into main Aug 28, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants